Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brownsys pivottrace 210 beta #1

Open
wants to merge 115 commits into
base: branch-2.1.0-beta
Choose a base branch
from

Conversation

JonathanMace
Copy link
Member

This pull request simply contains the aggregate changes from our use of this repository for xtrace / retro / pivottracing. The code is out of date with respect to any of those libraries, but illustrates where we made changes.

opposed to the Protobuf engine) but Todd's htrace instrumentation chose
to instrument this class too, so I might as well include it.
multiple times, we only want it logged when the stream is first closed.
put/retrieve XTrace contexts from those protobuf messages, and modified
a bunch of classes to call the utility methods rather than the protobuf
newBuilder methods.
Starting traces here isn't such a good idea; background tasks end up
accidentally starting traces when really we're not interestedin them.
boundaries when kicking off new threads.  Also, add some names when
setting XTraceContext to give spans names.
JonathanMace and others added 30 commits September 1, 2014 16:27
Removed unnecessary import

Instrumented the WritableRpcEngine.  I'm not sure where this is used (as
opposed to the Protobuf engine) but Todd's htrace instrumentation chose
to instrument this class too, so I might as well include it.

Add XTrace metadata to the data transfer header protos

Start tracing commands when initiated from the filesystem.

Changed the DataNode receiver to start traces or join traces.

Added XTrace metadata to the block and packet transfer headers

Instrumented the sender/receiver for downloading files from HDFS

Add check for xtrace metadata

Move start trace to the DataXceiver and add a little instrumentation for
writeblock.

Add XTrace instrumentation to pipeline write acks

Instrument code that mirrors data when pipelining to in-place modify the
xtrace metadata

Removed unnecessary instrumentation

Minor mistakes in the xtrace log statements

Tiny bits of additional instrumentation

Changed location of 'close' log messages, since close can be called
multiple times, we only want it logged when the stream is first closed.

Finished implementation of file writes from client side; mostly
implementation in DFSOutputStream

Instrumentation bugfix

Instrumented a couple more protobuf messages, added a utility class to
put/retrieve XTrace contexts from those protobuf messages, and modified
a bunch of classes to call the utility methods rather than the protobuf
newBuilder methods.

Fixed a context joining problem in the block receiver

Instrumented start and end of many DFSClient api function calls

Added some resource tracing events

Added an extra log statement to the DFSOutputStream

Added log statement to indicate that we're forcing a namenode block
update

Added some instrumentation of some of the main locks used on the
namenode.

Removed some annoying unnecessary RPC log messages, and added a 'name'
tag to name the spans in the rpc invoker.

Changed the startTrace logpoints in DataXceiver to just logEvents.
Starting traces here isn't such a good idea; background tasks end up
accidentally starting traces when really we're not interestedin them.

Use XTraceResourceTracing branching API to log the explicit computation
boundaries when kicking off new threads.  Also, add some names when
setting XTraceContext to give spans names.

Slightly modify server to extend the boundary of where the xtrace
context is cleared.

Removed all resource tracing stuff,  which will be moved to a new
branch.

Added dependency to XResourceTracing

Temporarily disabled code to propagate metadata between replicas, as it seems to be causing problems

Fixed an XTrace logging bug

Fix for the mysterious bug that was causing checksum inconsistencies.  Root cause was a bug in HDFS source code unrelated to X-Trace instrumentation.  Have fixed the bug.

Some small instrumentation tweaks

Added XResourceTracing as a bootclasspath option

Added the necessary command line arguments to put xresourcetracing on the bootclasspath, but left them commented out for now

We can now put xresourcetracing on the bootclasspath.  It only takes effect if xresourcetracing was built to weave rt.jar

Use a more accurate estimate for the PacketHeader size, now taking into account the fact that XTrace metadata can propagate options which might increase the PacketHeader size

Removed some hard-coded xtrace context passing that is now handled generically by the AspectJ instrumentation

Clear the thread context in lease renewer, which is a long-lived thread and shouldn't be attributed to the first task that kicks it off

Fix to add causality when the sending thread has to wait to receive ACKs

Fixed a bug - using less than instead of greater than

Removed and modified some of the XTraceContext.startTrace events, because they were polluting trace tags and annoying the hell out of me

Commented out the inclusion of resource tracing on the bootclasspath; it limits the classes that can be turned on and off in xtrace config.  For now, we aren't using rt.jar instrumentation, and therefore no reason for these to be on the bootclasspath

Commented out bootclasspath stuff, since for now we don't want or need it

More network instrumentation for RPC calls

Added a few entries to the default hdfs config; by default, there are a few things we want turned off

Testing the write speed with a lower cache drop behind buffer lag

Applied broken pom patch from HADOOP-10110; hasn't affected us thus far but sandbox build was failing

Temporary instrumentation adding additional events to DataXceiver

Undo previous commit

Add some more temporary logging

Removed some logging that was temporary

Revert "Removed some logging that was temporary"

This reverts commit d8d8727.

Migration from X-Trace 2.0 to X-Trace 3.0.  Preliminary commit of untested instrumentation

Small fix

Peer cache is long lived, don't attribute to first task we see

Whoops... didn't commit this properly...

Add a few log messages to NativeIO, why not

Instrument call queue in IPC Server

Slight changes to the RPC server and client response sending/processing threads, to make sure the correct XTrace metadata is set at all times

Add ability to have connection-per-client in a single process

For now, comment out the random sleep if a complete call fails after writing a file, because it produces way too much arbitrary interference

Add 5ms sleep instead of 400ms sleep

Add instrumentation of DN heartbeats

Bad import removed

Instrument more background tasks

Add a hacky addition to allow kinda throttling of background block replication

Oh, and make methods static

Fix divide by zero exception

Try alternative approach to replication throttling using the balancer bandwidth

Also default to large balancer bandwidth

Use both approaches simultaneously!

Removed balancer bandwidth - unnecessary

Moved the AspectJ stuff to the root pom.  I'm really not sure which one it's supposed to go in though.

Indentation in pom

Add showWeaveInfo=true to pom

Add throttling points to HDFS

Rename the throttling point on the RPC server

Put servername in the call queue to allow multiple servers

Fix up the propagation of X-Trace metadata from NodeManager to Containers (specifically, YarnChild and MRAppMaster containers

Let the examples in the examples jar start X-Trace tasks, useful for debugging and verifying that everything is working

Base64 encoding isn't quite compatible with HTTP headers because it uses a disallowed character '=' for padding.  For now just switch to Base16, which is less efficient but only includes alphanumeric characters.  Could tweak X-Trace base64 to not include padding but don't want to risk violating any existing instrumentation, no time to fix it

Fixed up the inclusion of xtrace metadata in the shuffle header

Put throttling queue only on NN server for now

Only add call queue instrumentation for namenode for now

Foolishly forgot to use base 16 decoding of XTrace header in shuffle handler

Add throttling point to ShuffleHandler and to RawFileSystem

Add some special handling for MR shuffle handler network output

Remove previous commit instrumentation of shuffle handler, was too much

Remove throttling point from local file system, put in spill thread

Add some cpu tracking... test

Put throttling points in new position

remove spill throttler

Share throttling point for ifile

Do this manually...

Manually disable datanode hostname check. Later versions of HDFS make this
configurable.

Add HDFS config option to specify whether to fadvise long files

Add some logging for datanode selection

Revert "Add some logging for datanode selection"

This reverts commit 220170e.
…ownsys-hadoop into brownsys-pivottrace-210-beta
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant